The Demography of Kinship

European Doctoral School of Demography, Paris, May 21-23, 2024

Authors

Diego Alburez-Gutierrez

Amanda Martins de Almeida

Published

September 16, 2024

Course description:

Kinship is a fundamental property of human populations and a key form of social structure. Demographers have long been interested in the interplay between demographic change and family configuration. This has led to the development of sophisticated methodological and conceptual approaches for the study of kinship, some of which are reviewed in this course.

Some useful things to know:

Getting started with matrix kinship models in R using DemoKin

We will start soon the computer lab sessions, so would be great if in advance we have prepared the R environment. First, you will need R and Rstudio installed. Second, install the DemoKin! Now it is on CRAN!

1. Installation

#install.packages("DemoKin")
library(DemoKin)

Load other packages that will be useful:

library(dplyr)
library(tidyr)
library(ggplot2)

2. Built-in data

The DemoKin package includes data from Sweden as an example. This comes from the Human Mortality Database and Human Fertility Database.

2.1. swe_px matrix; survival probabilities by age (DemoKin’s U argument)

First we have survival probabilities by age:

data("swe_px", package="DemoKin")

swe_px[1:5, 1:5]
     1900    1901    1902    1903    1904
0 0.91060 0.90673 0.92298 0.91890 0.92357
1 0.97225 0.97293 0.97528 0.97549 0.97847
2 0.98525 0.98579 0.98630 0.98835 0.98921
3 0.98998 0.98947 0.99079 0.99125 0.99226
4 0.99158 0.99133 0.99231 0.99352 0.99272

It has years in columns and age in rows. Plotting \(q_x\) (\(p_x\)´s complement) over age for 2015 gives:

swe_px %>%
    as.data.frame() %>%
    select(px = `2015`) %>%
    mutate(ages = 1:nrow(swe_px)-1) %>%
    ggplot() +
    geom_line(aes(x = ages, y = 1-px)) +
    scale_y_log10()

2.2. swe_asfr matrix; age specific fertility rate (DemoKin’s f argument)

And age-specific fertility rates:

data("swe_asfr", package="DemoKin")

swe_asfr[15:20, 1:4]
      1900    1901    1902    1903
14 0.00013 0.00006 0.00008 0.00008
15 0.00053 0.00054 0.00057 0.00057
16 0.00275 0.00319 0.00322 0.00259
17 0.00932 0.00999 0.00965 0.00893
18 0.02328 0.02337 0.02347 0.02391
19 0.04409 0.04357 0.04742 0.04380

Plotted over time and age for the same year:

swe_asfr %>% as.data.frame() %>%
      as.data.frame() %>%
      select(fx = `2015`) %>%
      mutate(age = 1:nrow(swe_asfr)-1) %>%
      ggplot() +
      geom_line(aes(x = age, y = fx))

3. ‘Keyfitz’ kinship diagram

We can visualize the implied kin counts for a Focal girl aged 5 yo in a time-invariant population using a network or ‘Keyfitz’ kinship diagram (Keyfitz and Caswell 2005) with the plot_diagram function:

First, get vectors for a given year:

swe_surv_2015 <- DemoKin::swe_px[,"2015"]
swe_asfr_2015 <- DemoKin::swe_asfr[,"2015"]

Run kinship models

swe_2015 <- kin(p = swe_surv_2015, f = swe_asfr_2015, time_invariant = TRUE)
swe_2015$kin_summary %>% 
  filter(age_focal == 5) %>% 
  select(kin, count = count_living) %>% 
  plot_diagram(rounding = 2)

4. What if we want to work with other countries?

We can estimate rates from any country in the world, produced by the World Population Prospects project. Pick any country (feel free to pick at random or pick the one you are interested in) and download the following data from:

After downloading the data for other countries, please download the entire GitHub repository here and save it to your computer. In the materials you have downloaded for this course, there is a folder called docs:

  • Create a folder named unwppdata inside the folder docs;

  • Store the WPP data in the folder you created.

You can have a look at the name of the country here.

Lets see the case of Brazil in 2015:

#load function
source("UNWPP_data.R")

#select country, year and sex
data <- UNWPP_data(country = "Brazil",
                   start_year =  2015,
                   end_year = 2015,
                   sex = "Female")

We have to reshape fertility and mortality to create a matrix to be used by DemoKin(i.e., create a matrix with years as columns and ages as rows):

Reshape fertility

country_fert <- data %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

Reshape survival

country_surv <- data %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()

5. ‘Keyfitz’ kinship diagram

We can visualize the implied kin counts for a Focal girl aged 5 yo in a time-invariant population using a network or ‘Keyfitz’ kinship diagram (Keyfitz and Caswell 2005) with the plot_diagram function:

Run kinship models

br_2015 <- 
  kin(p = country_surv, f = country_fert, time_invariant = TRUE)
br_2015$kin_summary %>% 
  filter(age_focal == 5) %>% 
  select(kin, count = count_living) %>% 
  plot_diagram(rounding = 2)

6. Exercise

6.1 Using the WPP data you downloaded, build the ‘Keyfitz’ kinship diagram for a Focal girl aged 5 yo in a time-invariant population. Discuss (with 2 or 3 colleagues) the results in relation to what we have seen for Sweden and Brazil: identify the main differences and the reasons for them.
6.2 We saw the ‘Keyfitz’ kinship diagram for a Focal girl aged 5 yo, but what about an older Focal, for example, a woman 60 yo, what would you expected? Build the kinship diagram for this Focal for Sweden, Brazil and the country of your choice and discuss.

Now share with the class what you have discussed in your groups!

7. Access UN WPP data via API

(Unfortunately it is not working, but for your information)

Thanks to the DataPortal API from the UN we can download estimated rates from any country in the world, produced by the World Population Prospects project.

Let’s see the case of Brazil as an example:

#load function
source("get_UNWPP_inputs.R")

#pick country
country <- c("Brazil")

#Year range
my_startyr   <- 1950
my_endyr     <- 2022

#data download
data <- get_UNWPP_inputs(
  countries = country,
  my_startyr = my_startyr,
  my_endyr = my_endyr)

In today’s session we will see the DemoKin functions and how to run a one-sex; time-invariant model.

Load the packages and download the data

library(dplyr)
library(tidyr)
library(ggplot2)
library(DemoKin)

Today (and tomorrow) we will use the Brazilian data from WPP that we downloaded yesterday. Let select the data we want again:

#load function
source("UNWPP_data.R")

#select country, year and sex
data <- UNWPP_data(country = "Brazil",
                   start_year =  2015,
                   end_year = 2015,
                   sex = "Female")

1. The function kin()

DemoKin can be used to compute the number and age distribution of Focal’s relatives under a range of assumptions, including living and deceased kin. The function DemoKin::kin() currently does most of the heavy lifting in terms of implementing matrix kinship models.

This is what it looks like in action, in this case assuming time-invariant demographic rates:

# First, reshape fertility and survival for a given year

br_asfr_2015 <- data %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

br_surv_2015 <- data %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()
# Run kinship models

br_2015 <- kin(p = br_surv_2015, f = br_asfr_2015, time_invariant = TRUE)
1.1. Arguments
  • p numeric. A vector (atomic) or matrix of survival probabilities with rows as ages (and columns as years in case of matrix).
  • f numeric. Same as U but for fertility rates.
  • time_invariant logical. Assume time-invariant rates. Default TRUE.
  • output_kin character. kin types to return: “m” for mother, “d” for daughter, …
1.2. Relative types

Relatives for the output_kin argument are identified by a unique code. Note that the relationship codes used in DemoKin differ from those in Caswell (2019). The equivalence between the two set of codes is given in the following table:

demokin_codes
   DemoKin Caswell               Labels_female                   Labels_male
1      coa       t    Cousins from older aunts     Cousins from older uncles
2      cya       v  Cousins from younger aunts   Cousins from younger uncles
3        c    <NA>                     Cousins                       Cousins
4        d       a                   Daughters                      Brothers
5       gd       b             Grand-daughters                    Grand-sons
6      ggd       c       Great-grand-daughters              Great-grand-sons
7      ggm       h          Great-grandmothers            Great-grandfathers
8       gm       g                Grandmothers                  Grandfathers
9        m       d                      Mother                        Father
10     nos       p   Nieces from older sisters   Nephews from older brothers
11     nys       q Nieces from younger sisters Nephews from younger brothers
12       n    <NA>                      Nieces                       Nephews
13      oa       r     Aunts older than mother     Uncles older than fathers
14      ya       s   Aunts younger than mother    Uncles younger than father
15       a    <NA>                       Aunts                        Uncles
16      os       m               Older sisters                Older brothers
17      ys       n             Younger sisters              Younger brothers
18       s    <NA>                     Sisters                      Brothers
                         Labels_2sex
1    Cousins from older aunts/uncles
2  Cousins from younger aunts/uncles
3                            Cousins
4                           Siblings
5                    Grand-childrens
6              Great-grand-childrens
7                Great-grandfparents
8                       Grandparents
9                            Parents
10      Niblings from older siblings
11    Niblings from younger siblings
12                          Niblings
13   Aunts/Uncles older than parents
14 Aunts/Uncles younger than parents
15                      Aunts/Uncles
16                    Older siblings
17                  Younger siblings
18                          Siblings
1.3. Value

DemoKin::kin() returns a list containing two data frames: kin_full and kin_summary.

str(br_2015)
List of 2
 $ kin_full   : tibble [142,814 × 7] (S3: tbl_df/tbl/data.frame)
  ..$ kin      : chr [1:142814] "d" "d" "d" "d" ...
  ..$ age_kin  : int [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ age_focal: int [1:142814] 0 1 2 3 4 5 6 7 8 9 ...
  ..$ living   : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ dead     : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ cohort   : logi [1:142814] NA NA NA NA NA NA ...
  ..$ year     : logi [1:142814] NA NA NA NA NA NA ...
 $ kin_summary: tibble [1,414 × 9] (S3: tbl_df/tbl/data.frame)
  ..$ age_focal     : int [1:1414] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ kin           : chr [1:1414] "coa" "cya" "d" "gd" ...
  ..$ year          : logi [1:1414] NA NA NA NA NA NA ...
  ..$ count_living  : num [1:1414] 0.2549 0.0831 0 0 0 ...
  ..$ mean_age      : num [1:1414] 11.37 5.52 NaN NaN NaN ...
  ..$ sd_age        : num [1:1414] 8.06 4.71 NaN NaN NaN ...
  ..$ count_dead    : num [1:1414] 0.000234 0.000148 0 0 0 ...
  ..$ count_cum_dead: num [1:1414] 0.000234 0.000148 0 0 0 ...
  ..$ mean_age_lost : num [1:1414] 0 0 NaN NaN NaN 0 0 0 0 NaN ...
kin_full

This data frame contains expected kin counts by year (or cohort), age of Focal, and age of kin.

head(br_2015$kin_full)
# A tibble: 6 × 7
  kin   age_kin age_focal living  dead cohort year 
  <chr>   <int>     <int>  <dbl> <dbl> <lgl>  <lgl>
1 d           0         0      0     0 NA     NA   
2 d           0         1      0     0 NA     NA   
3 d           0         2      0     0 NA     NA   
4 d           0         3      0     0 NA     NA   
5 d           0         4      0     0 NA     NA   
6 d           0         5      0     0 NA     NA   
kin_summary

This is a ‘summary’ data frame derived from kin_full. To produce it, we sum over all ages of kin to produce a data frame of expected kin counts by year or cohort and age of Focal (but not by age of kin). This is how the kin_summary object is derived:

kin_by_age_focal <- 
  br_2015$kin_full %>% 
  group_by(kin, age_focal) %>% 
  summarise(count = sum(living)) %>% 
  ungroup()

# Check that they are identical (for living kin only here)

kin_by_age_focal %>% 
  select(kin, age_focal, count) %>% 
  identical(
    br_2015$kin_summary %>% 
      select(kin, age_focal, count = count_living) %>% 
      arrange(kin, age_focal)
  )
[1] TRUE

2. Example: kin counts in time-invariant populations

Following Caswell (2019), we assume a female closed population in which everyone experiences the Brazilian 2015 mortality and fertility rates at each age throughout their life. We then ask:

How can we characterize the kinship network of an average member of the population (call her ‘Focal’)?

output_kin <- c("c", "d", "gd", "ggd", "ggm", "gm", "m", "n", "a", "s")

# Run kinship models
br_2015 <- kin(p = br_surv_2015, f = br_asfr_2015, output_kin = output_kin, time_invariant = TRUE)
2.1. Living kin

Now, let’s visualize how the expected number of daughters, siblings, cousins, etc., changes over the life course of Focal (now, with full names to identify each relative type using the function DemoKin::rename_kin()).

br_2015$kin_summary %>%
  rename_kin() %>%
  ggplot() +
  geom_line(aes(age_focal, count_living))  +
  geom_vline(xintercept = 35, color=2)+
  theme_bw() +
  labs(x = "Focal's age") +
  facet_wrap(~kin_label)
Joining with `by = join_by(kin)`

Note that we are working in a time invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).

How does overall family size (and family composition) vary over life for an average woman who survives to each age?

counts <- 
  br_2015$kin_summary %>%
  group_by(age_focal) %>% 
  summarise(count_living = sum(count_living)) %>% 
  ungroup()

br_2015$kin_summary %>%
  select(age_focal, kin, count_living) %>% 
  rename_kin() %>% 
  ggplot(aes(x = age_focal, y = count_living)) +
  geom_area(aes(fill = kin_label), colour = "black") +
  geom_line(data = counts, size = 2) +
  labs(x = "Focal's age", y = "Number of living female relatives") +
  coord_cartesian(ylim = c(0, 6)) +
  theme_bw() +
  theme(legend.position = "bottom")
Joining with `by = join_by(kin)`
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

2.2. Age distribution of living kin

How old are Focal’s relatives? Using the kin_full data frame, we can visualize the age distribution of Focal’s relatives throughout Focal’s life. For example when Focal is 35, what are the ages of her relatives:

br_2015$kin_full %>%
  rename_kin() %>%
  filter(age_focal == 35) %>%
  ggplot() +
  geom_line(aes(age_kin, living)) +
  geom_vline(xintercept = 35, color=2) +
  labs(y = "Expected number of living relatives") +
  theme_bw() +
  facet_wrap(~kin_label)
Joining with `by = join_by(kin)`

2.3. Deceased kin

We have focused on living kin, but what about relatives who have died during her life? The output of kin also includes information of kin deaths experienced by Focal.

We start by considering the number of kin deaths that can expect to experience at each age. In other words, the non-cumulative number of deaths in the family that Focal experiences at a given age.

loss1 <- 
  br_2015$kin_summary %>%
  filter(age_focal>0) %>%
  group_by(age_focal) %>% 
  summarise(count_dead = sum(count_dead)) %>% 
  ungroup()

br_2015$kin_summary %>%
  rename_kin() %>%
  filter(age_focal > 0) %>%
  group_by(age_focal, kin_label) %>%
  summarise(count_dead = sum(count_dead)) %>%
  ungroup() %>%
  ggplot(aes(x = age_focal, y = count_dead)) +
    geom_area(aes(fill = kin_label), colour = "black") +
    geom_line(data = loss1, size = 2) +
    labs(x = "Focal's age", y = "Number of kin deaths experienced at each age") +
    coord_cartesian(ylim = c(0, 0.086)) +
    theme_bw() +
    theme(legend.position = "bottom")

Now, we combine all kin types to show the cumulative burden of kin death for an average member of the population surviving to each age:

loss2 <- 
  br_2015$kin_summary %>%
  group_by(age_focal) %>% 
  summarise(count_cum_dead = sum(count_cum_dead)) %>% 
  ungroup()

br_2015$kin_summary %>%
  rename_kin() %>% 
  group_by(age_focal, kin_label) %>% 
  summarise(count_cum_dead = sum(count_cum_dead)) %>% 
  ungroup() %>% 
 ggplot(aes(x = age_focal, y = count_cum_dead)) +
  geom_area(aes(fill = kin_label), colour = "black") +
  geom_line(data = loss2, aes(y = count_cum_dead), size = 2) +
  labs(x = "Focal's age", y = "Number of kin deaths experienced (cumulative)") +
  theme_bw() +
  theme(legend.position = "bottom")
Joining with `by = join_by(kin)`
`summarise()` has grouped output by 'age_focal'. You can override using the
`.groups` argument.

A member of the population aged 15, 50, and 65 years old will have experienced, on average, the death of relatives, respectively:

loss2 %>%
  filter(age_focal %in% c(15, 50, 65)) %>%
  select(count_cum_dead) %>%
  pull(count_cum_dead) %>%
  round(1) %>%
  paste(collapse = ", ")
[1] "0.6, 2.3, 3.3"

3. Exercises

For all exercises, assume time-invariant rates at the 2015 levels in the country of your choice and a female-only population.

3.1 Offspring availability and loss

Use DemoKin (assuming time-invariant rates at the 2015 levels in the country of your choice and a female-only population) to explore offspring survival and loss for mothers. Answer: What is the expected number of surviving offspring for an average woman aged 65?

Answer: What is the cumulative number of offspring deaths experienced by an average woman who survives to age 65?

3.2 Mean age of kin

The output of DemoKin::kin includes information on the average age of Focal’s relatives (in the columns kin_summary$mean_age and kin_summary$sd_age). For example, this allows us to determine the mean age, standard deviation and coefficient of variation of Focal’s sisters over Focal’s life-course:

br_2015$kin_summary %>%
  filter(kin %in% c("os", "ys")) %>%
  rename_kin() %>%
  select(kin, age_focal, mean_age, sd_age) %>%
  mutate(`sd_age/mean_age` = sd_age/mean_age) %>%
  pivot_longer(mean_age:`sd_age/mean_age`) %>%
  ggplot(aes(x = age_focal, y = value, colour = kin)) +
  geom_line() +
  facet_wrap(~name, scales = "free") +
  labs(y = "Mean age of sister(s)") +
  theme_bw()

3.2.1 Using only the raw output in kin_full, get then mean age of living mother, daughter and sisters for a female aged 35.
4 Living mother

What is the probability that Focal (an average woman in your country of choice) has a living mother over Focal’s live?

Instructions

Use DemoKin to obtain \(M_1(a)\), the probability of having a living mother at age \(a\) in a stable population. Conditional on ego’s survival, \(M_1{(a)}\) can be thought of as a survival probability in a life table: it has to be equal to one when \(a\) is equal to zero (the mother is alive when she gives birth), and goes monotonically to zero.

Answer: What is the probability that Focal has a living mother when Focal turns 70 years old? And 25 years old?

In today’s session we will see how to application for the one-sex; time-variant model and two-sex;time-variant and invariant model.

First, load required libraries:

library(DemoKin)
library(dplyr)
library(tidyr)
library(ggplot2)

1. Living kin

We saw yesterday the case of Brazil in 2015 assuming constant rates. But the demography of Brazil is, as you know, changing every year. This means that Focal and her relatives will have experienced changing mortality and fertility rates over time.

Let’s select the Brazilian mortality and fertility rates for a range of years.

#load function
source("UNWPP_data.R")

#select country, year and sex
data <- UNWPP_data(country = "Brazil",
                   start_year = 1950 ,
                   end_year = 2021,
                   sex = "Female")

#reshape fertility
br_asfr <- data %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

#reshape survival
br_px <- data %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()

The data we are using has years in columns and ages in rows. Here, we plot \(q_x\) (p’s complement) over age and time:

br_px %>%
    as.data.frame() %>%
    mutate(age = 1:nrow(swe_asfr)-1) %>%
    pivot_longer(-age, names_to = "year", values_to = "px") %>%
    mutate(qx = 1-px) %>%
    ggplot() +
    geom_line(aes(x = age, y = qx, col = year)) +
    scale_y_log10() +
    theme(legend.position = "none")

Age-specific fertility rates:

br_asfr %>% as.data.frame() %>%
     mutate(age = 1:nrow(swe_asfr)-1) %>%
     pivot_longer(-age, names_to = "year", values_to = "asfr") %>%
     mutate(year = as.integer(year)) %>%
     ggplot() + geom_tile(aes(x = year, y = age, fill = asfr)) +
     scale_x_continuous(breaks = seq(1900,2020,10), labels = seq(1900,2020,10))

And female population by age: (br_pop.RData available in the folder )

load("br_pop.RData")

br_pop %>% as.data.frame() %>%
     mutate(age = 1:nrow(swe_asfr)-1) %>%
     pivot_longer(-age, names_to = "year", values_to = "pop") %>%
     mutate(year = as.integer(year)) %>%
     ggplot() + geom_tile(aes(x = year, y = age, fill = pop)) +
     scale_x_continuous(breaks = seq(1900,2020,10), labels = seq(1900,2020,10))

With this input we can model kinship structure in Age-Period-Cohort (APC) dimensions:

1.1 Cohort approach

Let’s take a look at the resulting kin counts from a time-variant (argument time_invariant = FALSE) model for a Focal born in 1960, limiting the output to a selection of relatives (see argument output_kin) and a given cohort (argument output_cohort). Do you see any new parameter?

br_time_varying_1960_cohort <-
  DemoKin::kin(p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_cohort = 1960,
    output_kin = c("d","gd","ggd","m","gm","ggm"))
Preparing output...
Assuming stable population before 1950.
# plot
br_time_varying_1960_cohort$kin_summary %>%
  rename_kin() %>%
  ggplot(aes(age_focal,count_living)) +
  geom_line()+
  scale_y_continuous(name = "Expected number of living relatives",labels = seq(0,3,.2),breaks = seq(0,3,.2))+
  facet_wrap(~kin_label)+
  theme_bw()
Joining with `by = join_by(kin)`

These are the living kin that for an average woman born in 1960, given the time-variant fertility, mortality and population distribution for the 1950-2021 period. If population is included as input then \(pi(t)\) will be “observed”. Note the argument output_cohort = 1960, used to extract estimates for a given cohort of Focals (a diagonal in the Lexis diagram). This is a subset from all possible results (101 age-classes and 71 years). Estimates stop at age 61 because we only provided (period) input data up to year 2021 (2021 - 1960 = 61).

Let us now compare across cohorts. We can, for example compare the 1960 and 1990 cohorts.

br_time_varying_1990_1960_cohort <-
  kin(p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_cohort = c(1990, 1960),
    output_kin = c("d","gd","ggd","m","gm","ggm"))
Preparing output...
Assuming stable population before 1950.
# plot
br_time_varying_1990_1960_cohort$kin_summary %>%
  rename_kin() %>%
  mutate(cohort = as.factor(cohort)) %>%
  ggplot(aes(age_focal,count_living,color=cohort)) +
  geom_line()+
  scale_y_continuous(name = "Expected number of living relatives",labels = seq(0,3,.2), breaks = seq(0,3,.2))+
  facet_wrap(~kin_label)+
  theme_bw()
Joining with `by = join_by(kin)`

1.2 Period approach

Maybe you are interested in taking a snapshot of kin distribution in some year, for example 1960. You can do this by specifying the argumentoutput_period = 1960.

br_time_varying_1960_period <-
  kin(
    p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_period = 1960,
    output_kin = c("d","gd","ggd","m","gm","ggm")
    )
Preparing output...
Assuming stable population before 1950.
# plot
br_time_varying_1960_period$kin_summary %>%
  rename_kin() %>%
  ggplot(aes(age_focal, count_living)) +
  geom_line() +
  scale_y_continuous(
    name = "Expected number of living relatives",
    limits = c(0, 5),  
    labels = seq(0, 5, 0.5),  
    breaks = seq(0, 5, 0.5)  
  ) +
  facet_wrap(~kin_label, scales = "free") +
  theme_bw()
Joining with `by = join_by(kin)`

Answer: Do these ‘period’ plots look similar to the ‘cohort’ plots shown above? When would you prefer a period over a cohort approach?

1.3 DemoKin doesn’t like cohort-period combinations

DemoKin will only return values for either periods OR cohorts, but never for period-cohort combinations. This is related to time/memory issues. E.g., providing all possible period-cohort estimates in our exampe would give a data frame with 119 X 101 x 101 x 14 ~ 17 millions rows.

Consider the following code, which will give an error since we are asking for both a cohort and period output at the same time:

kin(p = br_px,
    f = br_asfr,
    n = br_pop,
    time_invariant =FALSE,
    output_cohort = c(1960, 1990),
    output_period = 2000,
    output_kin = c("d","gd","ggd","m","gm","ggm"))
Error in kin(p = br_px, f = br_asfr, n = br_pop, time_invariant = FALSE, : sorry, you can not select cohort and period. Choose one please

2. Kin death

Kin loss can have severe consequences for bereaved relatives as it affects, for example, the provision of care support and intergenerational transfers over the life course. The function kin provides information on the number of relatives lost by Focal during her life, stored in the column kin_summary$count_cum_death. The plot below compares patterns of kin loss for the 1960 and 1990 cohorts.

br_time_varying_1990_1960_cohort$kin_summary %>%
  rename_kin() %>%
  mutate(cohort = as.factor(cohort)) %>%
  ggplot() +
  geom_line(aes(age_focal, count_cum_dead, col = cohort)) +
  labs(y = "Expected number of deceased relatives") +
  theme_bw() +
  facet_wrap(~kin_label,scales="free")

Answer: Based on the previous plot, which kin types show the largest differences in terms of kin loss across the two cohorts? Discuss with regards to absolute and relative differences in the expected number of deaths by kin type.

Given these population-level measures, we can also compute Focal’s mean age at the time of her relative’s death.

br_time_varying_1990_1960_cohort$kin_summary %>%
  rename_kin() %>%
  filter(age_focal == 30) %>%
  select(kin_label, cohort, mean_age_lost) %>%
  pivot_wider(names_from = cohort, values_from = mean_age_lost) %>%
  mutate_if(is.numeric, round, 1)
Joining with `by = join_by(kin)`
# A tibble: 6 × 3
  kin_label             `1960` `1990`
  <chr>                  <dbl>  <dbl>
1 Daughters               23.9   23.1
2 Grand-daughters        NaN    NaN  
3 Great-grand-daughters  NaN    NaN  
4 Great-grandmothers       9     10.4
5 Grandmothers            16.3   17.9
6 Mother                  17.1   17.8

Answer: Consider a Focal aged 30 in both cohorts: how would you describe the differences in terms of her mean age at kin loss for different relatives types?

3.Age-classified two-sex kinship models and some exercises (we will do in the afternoon).

Human males generally live shorter and reproduce later than females. These sex-specific processes affect kinship dynamics in a number of ways. For example, the degree to which an average member of the population, call her Focal, has a living grandparent is affected by differential mortality affecting the parental generation at older ages. We may also be interested in considering how kinship structures vary by Focal’s sex: a male Focal may have a different number of grandchildren than a female Focal given differences in fertility by sex. Documenting these differences matters since women often face greater expectations to provide support and informal care to relatives. As they live longer, they may find themselves at greater risk of being having no living kin. The function kin2sex implements two-sex kinship models as introduced by Caswell (2022).

3.1 Demographic rates by sex

Data on male fertility by age is less common than female fertility. Schoumaker (2019) shows that male TFR is almost always higher than female Total Fertility Rates (TFR) using a sample of 160 countries, and this gap decrease with fertility transition.

For this example, we use data from 2012 France (from Caswell (2022)) to exemplify the use of the two-sex function in DemoKin. Data on female and male fertility and mortality are included in the package.

age <- 0:100
ages <- length(age)
fra_fert_f <- fra_asfr_sex[,"ff"]
fra_fert_m <- fra_asfr_sex[,"fm"]
fra_surv_f <- fra_surv_sex[,"pf"]
fra_surv_m <- fra_surv_sex[,"pm"]

# plot
data.frame(value = c(fra_fert_f, fra_fert_m, fra_surv_f, fra_surv_m),
           age = rep(age, 4),
           sex = rep(c(rep("f", ages), rep("m", ages)), 2),
           risk = c(rep("fertility rate", ages * 2), rep("survival probability", ages * 2))) %>%
  ggplot(aes(age, value, col=sex)) +
  geom_line() +
  facet_wrap(~ risk, scales = "free_y") +
  theme_bw()

3.2 Time-invariant two-sex kinship models

We now introduce the functions kin2sex, which is similar to the one-sex function kin (see ?kin) with two exceptions. First, the user needs to specify mortality and fertility by sex. Second, needs indicate the sex of Focal (which is assumed to be female by default, as in the one-sex model). Let us first consider the application for time-invariant populations:

fra_kin_2sex <- kin2sex(
  pf = fra_surv_f,
  pm = fra_surv_m,
  ff = fra_fert_f,
  fm = fra_fert_m,
  time_invariant = TRUE,
  sex_focal = "f",
  birth_female = .5)

The output of kin2sex is equivalent to that of kin, except that it includes a column sex_kin to specify the sex of the given relatives. Take a look with head(fra_kin_2sex$kin_summary).

A note on terminology: The function kin2sex uses the same codes as kin to identify relatives (see demokin_codes()). Note that when running a two-sex model, the code ‘m’ refers to either mothers or fathers! Use the column sex_kin to filter the sex of a given relatives. For example, in order to consider only sons and ignore daughters, use:

fra_kin_2sex$kin_summary %>%
  filter(kin == "d", sex_kin == "m") %>%
  head()
# A tibble: 6 × 11
  age_focal kin   sex_kin year  cohort count_living mean_age sd_age count_dead
      <int> <chr> <chr>   <lgl> <lgl>         <dbl>    <dbl>  <dbl>      <dbl>
1         0 d     m       NA    NA                0      NaN    NaN          0
2         1 d     m       NA    NA                0      NaN    NaN          0
3         2 d     m       NA    NA                0      NaN    NaN          0
4         3 d     m       NA    NA                0      NaN    NaN          0
5         4 d     m       NA    NA                0      NaN    NaN          0
6         5 d     m       NA    NA                0      NaN    NaN          0
# ℹ 2 more variables: count_cum_dead <dbl>, mean_age_lost <dbl>

Let’s group aunts and siblings and visualize the number of living kin by sex and Focal’s age.

kin_out <- fra_kin_2sex$kin_summary %>%
  mutate(kin = case_when(kin %in% c("os", "ys") ~ "s",
                         kin %in% c("ya", "oa") ~ "a",
                         T ~ kin)) %>%
  filter(kin %in% c("d", "m", "gm", "ggm", "s", "a"))

kin_out %>%
  summarise(count=sum(count_living), .by = c(kin, age_focal, sex_kin)) %>%
  rename_kin %>% 
  ggplot(aes(age_focal, count, fill=sex_kin))+
  geom_area()+
  theme_bw() +
  facet_wrap(~kin_label)

Information on kin availability by sex allows us to consider sex ratios, a traditional measure in demography, with females often in denominator. The following figure, for example, shows that a 25yo French woman in our hypothetical population can expect to have 0.5 grandfathers for every grandmother. Is always the case that the sex ratio will decrease by Focal´s age?

kin_out %>%
  group_by(kin, age_focal) %>%
  summarise(sex_ratio = sum(count_living[sex_kin=="m"], na.rm=T)/sum(count_living[sex_kin=="f"], na.rm=T)) %>%
  rename_kin %>% 
  ggplot(aes(age_focal, sex_ratio))+
  geom_line()+
  theme_bw() +
  facet_wrap(~kin_label, scales = "free")

Answer: Should the total number of living aunts be the same in the one-sex model compared to the two-sex models? What about daughters?

The experience of kin loss for Focal depends on differences in mortality between sexes. A female Focal starts losing fathers earlier than mothers. We see a slightly different pattern for grandparents since Focal’s experience of grandparental loss is dependent on the initial availability of grandparents (i.e. if Focal’s grandparent died before her birth, she will never experience his death). What do you think?

kin_out %>%
  summarise(count=sum(count_dead), .by = c(kin, sex_kin, age_focal)) %>%
  rename_kin %>% 
  ggplot(aes(age_focal, count, col=sex_kin))+
  geom_line()+
  theme_bw() +
  facet_wrap(~kin_label)

3.3 Time-variant two-sex kinship models

We look at populations where demographic rates are not static but change on a yearly basis. For this, we extend the period using data located in “docs/fra_2sex.Rdata”, that you can load with function load as we did in previous days. This is UN data, so another exercise can be done with HMD and HFD going back in time. For this example, we will ‘pretend’ that male fertility rates are the same than fertility but slightly older, translating shape for the difference in the mean age observed in 2012 (that you calculated before). Actually there is some data for the period 1998-2013 in HFD, but just to keep it simple so far (and also needs to extrapolate back level and pattern).

load("fra_2sex.Rdata")
years <- ncol(fra_asfr_females)
ages <- nrow(fra_asfr_females)

# difference between sex in mean age in 2012
mac_females_2012 <- sum(0:100 * fra_fert_f)/sum(fra_fert_f)
mac_males_2012   <- sum(0:100 * fra_fert_m)/sum(fra_fert_m)
dif_mac_2012     <- trunc(mac_males_2012 - mac_females_2012)

# create a matrix of male fertility
fra_asfr_males <- matrix(0, ages, years)
colnames(fra_asfr_males) <- colnames(fra_asfr_females)
fra_asfr_males[(dif_mac_2012+1):ages,] <- fra_asfr_females[1:(ages-dif_mac_2012),]

# plot any year
plot(age, fra_asfr_females[,"1990"], t="l", col=2, ylab = "asfr")
lines(age, fra_asfr_males[,"1990"], col=4)
legend("topright", c("females", "males"), col=c(2,4), lty=1)

We now run the time-variant two-sex models (note the time_invariant = FALSE argument):

kin_out_time_variant <- kin2sex(
                      pf = fra_surv_females,
                      pm = fra_surv_males,
                      ff = fra_asfr_females,
                      fm = fra_asfr_males,
                      sex_focal = "f",
                      time_invariant = FALSE,
                      birth_female = .5,
                      output_cohort = 1950)
Preparing output...
Assuming stable population before 1950.

We can plot data on kin availability alongside values coming from a time-invariant model to show how demographic change matters: the time-variant models take into account changes derived from the demographic transition, whereas the time-invariant models assume never-changing rates. Effects are the same for each sex?

kin_out_time_invariant <- kin2sex(
                      pf = fra_surv_females[,"1950"],
                      pm = fra_surv_males[,"1950"],
                      ff = fra_asfr_females[,"1950"],
                      fm = fra_asfr_males[,"1950"],
                      time_invariant = TRUE,
                      sex_focal = "f", birth_female = .5)

kin_out_time_variant$kin_summary %>%
  filter(cohort == 1950) %>% mutate(type = "variant") %>%
  bind_rows(kin_out_time_invariant$kin_summary %>% mutate(type = "invariant")) %>%
  mutate(kin = case_when(kin %in% c("ys", "os") ~ "s",
                         kin %in% c("ya", "oa") ~ "a",
                         T ~ kin)) %>%
  filter(kin %in% c("d", "m", "gm", "ggm", "s", "a")) %>%
  group_by(type, kin, age_focal, sex_kin) %>%
  summarise(count=sum(count_living)) %>%
  rename_kin %>% 
  ggplot(aes(age_focal, count, linetype=type))+
  geom_line()+ theme_bw() +
  facet_grid(cols = vars(kin_label), rows=vars(sex_kin), scales = "free")

An interpretation note: we are not tracking line of descendence or ascendence. That means that for example, grand-daughters can not be differentiate if they are offspring from Focal´s son or Focal´s daughter. You can visualize this looking outputs from DemoKin and relating which data you can construct with actual variables.

4. Exercises (let’s go through this in the afternoon session)

4.1 Living kin by sex

4.1.1 Download data for your country from United Nations-Population Division. But this time also use the parameter sex = "Male" because you will need male-specific survival patterns by age and time. Don´t forget to reshape data to matrix format as we did yesterday. Assume for this exercise the same fertility pattern for males than females and answer:

Let’s see an example for Brazil:

# load function
source("UNWPP_data.R")

#data download
data_females <- UNWPP_data(country = "Brazil",
                   start_year =  1950,
                   end_year = 2022,
                   sex = "Female")


data_males <- UNWPP_data(country = "Brazil",
                   start_year =  1950,
                   end_year = 2022,
                   sex = "Male")
# reshape again

br_asfr_females <- data_females %>%
  select(age, year, fx) %>%
  pivot_wider(names_from = year, values_from = fx) %>%
  select(-age) %>%
  as.matrix()

br_surv_females <- data_females %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()

br_surv_males <- data_males %>%
  select(age, year, px) %>%
  pivot_wider(names_from = year, values_from = px) %>%
  select(-age) %>%
  as.matrix()
4.1.2 In a time invariant model: how many living grand-mothers and grand-fathers can a woman expect to have at age 15 in 1950 and in 2015, and what are their mean ages? Extract a conclusion based on the results.
4.1.3 Compare ‘kin sex ratios’ of grandparents, parents, daughters and siblings in a time-variant framework for the cohort 1950, at each age of Focal.

5 What about countries we don’t have information on male fertlity?

For the country of your choice, how many living kins by sex can a woman expect to have at age 25 in 2015? Run the following settings: - One-sex model; time-invariant rates; GKP factors - Two-sex model; time-variant rates; approximate male kin using the androgynous assumption (i.e., male fertility is equivalent to female fertility); use mortality rates for males and females

  • One-sex model; time-invariant rates; GKP factors

  • Two-sex model; time-variant rates; approximate male kin using the androgynous assumption

6. Extensions

For a detailed description of extensions of the matrix kinship model, see:

Description

You will use data on kinship structures to benchmark formal models of kinship. For this exercise, you will use the DemoKin R package to implement formal models of kinship. You should choose one country and run four different models according to the following specifications:

  • One-sex model; approximate male kin using GKP factors
    • time-invariant rates
    • time-variant rates
  • Two-sex model; approximate male kin using the androgynous assumption
    • time-invariant rates
    • time-variant rates

Use the output of the four models to answer the following questions:

  1. Plot the expected number of living relatives by age of focal for each specification. For extra points (i.e., this is optional), also plot the expected number of deceased relatives by age of focal.

  2. Discuss 1-2 key insights, when would you use different specifications? Consider the specific context and the data available for the country you selected. (max 250 words)

  3. Can you think of other ways of incorporating male fertility into the kinship models (beyond the options we discussed in the course)? (max 250 words)

For time-variant model, you will need population data for the argument “n”. Please download the data here. (backup link at Web Archive)

After downloading please save the data in the unwppdata folder you already created.

#run the function to prepare the data for DemoKin

wpp_pop <- function(country_name) {
  WPP2022_pop <- read_csv("unwppdata/WPP2022_Population1JanuaryBySingleAgeSex_Medium_1950-2021.zip")
  
    wpp <- WPP2022_pop %>% 
    select(age = AgeGrpStart, country = Location, year = Time, pop = PopFemale) %>% 
    filter(country == country_name) %>% 
    pivot_wider(names_from = year, values_from = pop) %>% 
    select(-age, -country) %>%
    as.matrix()
  
  if (is.null(row.names(wpp))) {
    row.names(wpp) <- seq_len(nrow(wpp))
  }
  
  row.names(wpp)[1] <- "0"
  
  return(wpp)
}
br_pop <-
  wpp_pop("Brazil") #choose your country 

Handing in the assignment

Assignments (one per group) should be sent by email to martins@demogr.mpg.de before midnight of Friday, May 24. You should hand in the following files:

  1. An .QMD file with all your code and answers to the exercise questions
  2. A compiled .pdf of your markdown file showing all the code

References

References

Caswell, Hal. 2019. “The Formal Demography of Kinship: A Matrix Formulation.” Demographic Research 41 (September): 679–712. https://doi.org/10.4054/DemRes.2019.41.24.
———. 2020. “The Formal Demography of Kinship II: Multistate Models, Parity, and Sibship.” Demographic Research 42 (June): 1097–1146. https://doi.org/10.4054/DemRes.2020.42.38.
———. 2022. “The Formal Demography of Kinship IV: Two-Sex Models and Their Approximations.” Demographic Research 47 (September): 359–96. https://doi.org/10.4054/DemRes.2022.47.13.
Caswell, Hal, and Xi Song. 2021. “The Formal Demography of Kinship. III. Kinship Dynamics with Time-Varying Demographic Rates.” Demographic Research 45: 517–46.
Keyfitz, Nathan, and Hal Caswell. 2005. Applied Mathematical Demography. New York: Springer.